Efficient Probabilistic Latent Semantic Indexing using Graphics Processing Unit
نویسندگان
چکیده
In this paper, we propose a scheme to accelerate the Probabilistic Latent Semantic Indexing (PLSI), which is an automated document indexing method based on a statistical latent semantic model, exploiting the high parallelism of Graphics Processing Unit (GPU). Our proposal is composed of three techniques: the first one is to accelerate the Expectation-Maximization (EM) computation by applying GPU matrix-vector multiplication; the second one uses the same principles as the first method, but deals with the sparseness of co-occurrence of words and documents; and the third one is to use the concurrent kernel execution, which is available on NVIDIA Fermi architecture, in order to speed up the process. We compare the performance of the proposed scheme with the non-parallelized implementation. The results show that our method could be more than 100 times faster than the CPU-based implementation in our environment. By dealing with the sparseness of the data, we could not only process more documents and words using GPU, but we could also keep more data on the device memory so that we can avoid massive data copy transfer between the host and the device susceptible to reduce the execution performance.
منابع مشابه
High order pLSA for indexing tagged images
This work presents a method for the efficient indexing of tagged images. Tagged images are a common resource of social networks and occupy a large portion of the social media stream. Their basic characteristic is the co-existence of two heterogeneous information modalities i.e. visual and tag, which refer to the same abstract meaning. This multi-modal nature of tagged images makes their efficie...
متن کاملParallel Implementations of Probabilistic Latent Semantic
Probabilistic Latent Semantic Analysis (PLSA) has been successfully applied to many text mining tasks such as retrieval, clustering, summarization, etc. PLSA involves iterative computation for a large number of parameters and may take hours or even days to process a large dataset, thus speeding up PLSA is highly motivated in the domain of text mining. Recently, the general purpose graphic proce...
متن کاملAccelerating Collapsed Variational Bayesian Inference for Latent Dirichlet Allocation with Nvidia CUDA Compatible Devices
In this paper, we propose an acceleration of collapsed variational Bayesian (CVB) inference for latent Dirichlet allocation (LDA) by using Nvidia CUDA compatible devices. While LDA is an efficient Bayesian multi-topic document model, it requires complicated computations for parameter estimation in comparison with other simpler document models, e.g. probabilistic latent semantic indexing, etc. T...
متن کاملMassively Parallel Latent Semantic Analyses using a Graphics Processing Unit
Latent Semantic Indexing (LSA) aims to reduce the dimensions of large Term-Document datasets using Singular Value Decomposition. However, with the ever expanding size of data sets, current implementations are not fast enough to quickly and easily compute the results on a standard PC. The Graphics Processing Unit (GPU) can solve some highly parallel problems much faster than the traditional sequ...
متن کاملProbabilistic Latent Semantic Analysis
Probabilistic Latent Semantic Analysis (pLSA) is a technique from the category of topic models. Its main goal is to model cooccurrence information under a probabilistic framework in order to discover the underlying semantic structure of the data. It was developed in 1999 by Th. Hofmann [7] and it was initially used for text-based applications (such as indexing, retrieval, clustering); however i...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011